-
Notifications
You must be signed in to change notification settings - Fork 78
feat(SmtForest) Skeleton out the new forest #767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8316321 to
a67dff8
Compare
This commit primarily exists to sketch out the interface for the new forest implementation, aimed at getting agreement on the interface and basic architecture before implementation begins in earnest. It does this directly on the `LargeSmtForest` type, which will be the user-facing portion of the API and likely later renamed to simply `SmtForest`. As part of doing this, it includes a number of important utility types, including: - `Backend`, which defines the minimal set of features required for the SMT forest's pluggable backends. Backends are aware of tree semantics. - `Operation`, contained in `SmtUpdateBatch` and `SmtForestUpdateBatch` which define sets of modifications to be made to individual trees in the forest and the entire forest respectively. - `InMemoryPrefix`, which implements the in-memory portion of a tree for the backends that keep only some prefix in memory. This was originally implemented for a different approach, but the code is still valid and will be useful in the future, and so is still included. - A small set of utility types: `SubtreeLevels`, `RootInfo`, and `LinearIndex`. These were also implemented for a different approach but will be useful in the future. - An assortment of error types to ensure strongly-typed error handling throughout the SMT forest API.
a67dff8 to
312cb33
Compare
|
I wasn't sure if it made sense to include the extra bits of implementation that will be useful in the future. If you don't want them as part of this PR I can pull them out until they are needed; my main worry was just them bitrotting and ending up with a difficult merge/rebase when they do get used. Let me know! |
huitseeker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice simplification compared to the Storage trait from PR #700.
The decision to make backends "aware of tree semantics" (as the module doc says) but not responsible for the actual tree logic seems like a good middle ground.
| /// the provided `backend`. | ||
| pub fn new(_backend: B) -> Result<Self, B::Error> { | ||
| todo!("LargeSmtForest::new") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at issue #670, the design sketch shows an add_tree method for adding new independent trees to the forest:
fn add_tree(&self, leaves: LeafMap) -> Result<Root>;I do not see an equivalent in this skeleton. How would a caller add a brand new tree (not derived from modifying an existing one) to the forest? Is the idea that you can call modify_tree on the empty tree root to create one, or is add_tree planned for a follow-up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan is currently that you just modify tree on an empty root. Is add_tree needed or useful?
| /// `operations`. | ||
| /// - If applying the provided `operations` results in no changes to a given lineage of trees in | ||
| /// the forest, then no new tree must be allocated in that lineage. | ||
| fn modify_forest( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new Backend trait is simpler and does not have explicit begin/commit methods. But I am curious how error recovery would work for persistent backends. If modify_forest partially succeeds (say, updates 3 of 5 trees) and then hits an error, what is the expected state? Should the trait doc clarify whether implementations must be atomic (all-or-nothing) or whether partial failures are allowed?
(this is the thrust of my comments on abort /rollback in #700, BTW — though I'm more interested in the atomicity contract in the abstract sense than how that's conveyed in the API, here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking on this is that atomicity is now an internal concept to the Backend. It was never intended to be user-facing in the previous design, just exposed to the forest itself, as I don't think we ever want to contend with user-facing rollback.
The main reason the concept exists is to prevent data corruption in the persistent backend. If something like your above scenario occurs in the persistent backend the whole transaction will be backed out.
That said, I think documenting something is probably valuable. What would you expect to happen for the in-memory backend should updating the 4th tree fail? To roll back to the previous state, or allow the partial delta? The same question applies if the backend is part-way through applying operations and fails.
I'm not really an API client here, so understanding how this should behave is important to me!
bobbinth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you! I left some comments inline - but we are pretty close.
I did not review InMemoryPrefix or subtree-related code. I'd probably remove them from this PR because this is more about aligning on the LargeSmtForest interface, and it would speed up review/merge process.
| // We start by clearing any history for which the `version` corresponds to the latest | ||
| // version and hence the full tree. | ||
| self.backend.versions()?.for_each(|(root, v)| { | ||
| if v == version { | ||
| self.histories | ||
| .get_mut(&root) | ||
| .expect( | ||
| "A full tree did not have a corresponding history, but is required | ||
| to", | ||
| ) | ||
| .clear(); | ||
| } | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me why we need this. Wouldn't code on lines 354 - 357 below do the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not currently. The history will refuse to remove the last delta as it can't know if the truncation point it is given is between the latest delta and the current version, or whether it is newer than the current version.
| // Then we just run through all the histories and truncate them to this version if needed, | ||
| // which provides the correct behaviour. | ||
| self.histories.values_mut().for_each(|h| { | ||
| h.truncate(version); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the history becomes empty post truncation, should we remove it from self.histories map. Or is the idea to keep histories for all roots managed by the forest - even if some histories are empty?
If we do keep entries for all roots, it may be good to have a separate map which keeps track of trees with non-empty histories. This way, we won't need to iterate over all histories here (which could be expensive because there could be many millions of trees in the forest).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had intended that we keep histories for all lineages in the forest as it simplifies history operation by not having to deallocate/reallocate histories. I had a comment stating this, but it clearly got lost in the shuffle! I've added it back, and also added a set tracking which histories actually contain deltas.
This commit predominantly cleans up a few elements of the forest design, but also removes some code that was originally included for later usage.
a700d29 to
59a9c3e
Compare
|
I've pushed a commit that addresses the majority of comments left above. There are some open discussions in my mind with regards to particular behaviours, so if we could clarify those that would be great! |
bobbinth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you! I left a coupe more small comments inline.
But, I just realized that I missed one major thing: I don't think we can actually use tree roots to identify trees. This was possible in the old implementation of the SmtForest where subtrees between trees were shared. But in this implementation, it is perfectly possible for two (or more) trees to exist with the same root but actually be distinct logical trees.
The main example of this is that we have many accounts and it may so happen that two accounts end up having the same exact storage trees. This could happen at the same version or at different versions. For example, account A could have tree with root R1 at block 10, and account B could have a tree with root R1 at block 20. Or both of them could have trees with roots R2 at block 30.
The fact that there are multiple trees doesn't really matter for querying, but it does matter for updates. For example, if both accounts have the R2 tree at block 30 and account B gets updated, it's tree may become R3 at block 31 - but this should affect account As tree.
I haven't thought too much about it yet, but I think a solution could be to just pass something like context or domain together with a tree root. So, instead of saying "I want to do something with a tree with root R1" we'd be saying "I want to do something with a tree with root R1 in the context A".
I believe this should address the use cases of miden-node as we always know in the context of which account we are querying the SmtForest (but @drahnr, please confirm).
So, there are two ways to proceed:
- We can merge this PR and then make a follow-up PR to fix the shortcomings.
- We can fix things in this PR.
I have a slight preference for the first approach.
Describe your changes
This commit primarily exists to sketch out the interface for the new forest implementation, aimed at getting agreement on the interface and basic architecture before implementation begins in earnest. It does this directly on the
LargeSmtForesttype, which will be the user-facing portion of the API and likely later renamed to simplySmtForest.As part of doing this, it includes a number of important utility types, including:
Backend, which defines the minimal set of features required for the SMT forest's pluggable backends. Backends are aware of tree semantics.Operation, contained inSmtUpdateBatchandSmtForestUpdateBatchwhich define sets of modifications to be made to individual trees in the forest and the entire forest respectively.InMemoryPrefix, which implements the in-memory portion of a tree for the backends that keep only some prefix in memory. This was originally implemented for a different approach, but the code is still valid and will be useful in the future, and so is still included.SubtreeLevels,RootInfo, andLinearIndex. These were also implemented for a different approach but will be useful in the future.Contains work toward the completion of #670. I have set it as "no changelog" as it is not yet intended to be part of the public API.
Checklist before requesting a review
Repo forkedand branch created fromnextaccording to naming convention.